A Polynomial Time Extension of Parallel Multiple Context-Free Grammar

نویسنده

  • Peter Ljunglöf
چکیده

It is already known that parallel multiple context-free grammar (PMCFG) [1] is an instance of the equivalent formalisms simple literal movement grammar (sLMG) [2,3] and range concatenation grammar (RCG) [4,5]. In this paper we show that by adding the single operation of intersection, borrowed from conjunctive grammar [6], PMCFG becomes equivalent to sLMG and RCG. As a corollary we get that PMCFG with intersection describe exactly the class of languages recognizable in polynomial time. The layout of this paper is as follows. The first section contains definitions of the basic grammar formalisms we are interested in. The second section introduces the intersection operation for PMCFG. The third section contains the main result of the paper – that PMCFG extended with the intersection operation is equivalent to simple LMG and RCG. The fourth and last section is a small discussion of the results. 1 GCFG, PMCFG, sLMG and RCG 1.1 Generalized Context-Free Grammar Generalized context-free grammar (GCFG) was introduced by Pollard in the 80’s as a way of formally describing head grammar [7]. There are several definitions of GCFG in the literature; Seki et al [1] use a definition similar to Pollard’s original, while others [8,9,10] more cleanly separates between abstract and concrete syntax. However, the latter definitions use the term GCFG for only the abstract part of the grammar, and the term context-free rewriting system for the abstract grammar together with the concrete interpretation function. While Pollard imposed no restriction on the concrete linearization type, other definitions restrict them to be tuples of strings. Here we use the definition from [11], which is close to the original definition. Definition 1 (GCFG, abstract part). The abstract grammar of a GCFG is a tuple (C, S,F ,R), where C and F are finite sets of categories and function symbols respectively, S ∈ C is the starting category, and R ⊆ C × F × C∗ is a finite set of context-free syntax rules. For each function symbol f ∈ F there is an associated context-free syntax rule: A→ f [B1, . . . , Bδ] The arity of the rule is δ, and in general we write δf for the arity of the rule f . The tree rewriting relation t : A is defined as f(t1, . . . , tδ) : A whenever t1 : B1, . . . , tδ : Bδ. We say that a tree t is valid (for a given category A) if t : A. Example 1. The abstract grammar of a simple fragment of English might look like the following, S → sp[NP, VP] S → st[NP, VP] VP → vp[V, NP] NP → np[D, N] D → some[] D → most [] N → cat [] NP → fish[] V → eat [] V → catch[] The idea is that the grammar should be able to handle both normal word order (‘most cats eat fish’), and topicalized sentences (‘it is fish that most cats eat ’). Definition 2 (GCFG, concrete part). To each category A is associated a linearization type A◦, which is not further specified. To each function symbol f is associated a partial linearization function f◦, taking as many arguments as the abstract syntax rule specifies: f◦ ∈ B◦ 1 × · · · ×B◦ δ → A◦ The linearization [[·]] of syntax trees is defined as, [[f(t1, . . . , tδ)]] = f([[t1]], . . . , [[tδ]]) if the application is defined. Note that the definition imposes no restrictions on the linearization types or the linearization functions; this is left to the actual grammar formalism. For our purposes it is enough to view a linearization type as the set of all its possible linearization values. To be able to define the language of a grammar as a set of strings, we demand that the linearization type of the starting category is S◦ = Σ∗. The language of a grammar G then becomes: L(G) = { [[t]] | t : S } 1.2 Parallel Multiple Context-Free Grammar Parallel multiple context-free grammar (PMCFG) [1,12] were introduced in the late 80’s by Kasami, Seki et al. as a very expressive formalism, incorporating linear context-free rewriting systems and other mildly context-sensitive formalisms, but still with a polynomial parsing algorithm. Definition 3 (PMCFG). PMCFG is an instance of GCFG, with the following restrictions on linearizations: – Linearization types are restricted to tuples of strings. In other words, each PMCFG grammar defines a linearization arity d(C) for each category C. The linearization types can then be defined as C◦ = (Σ∗)d(C). – The only allowed operations in linearization functions are tuple projections and string concatenations. In other words, each PMCFG linearization function is of the form, f◦ (〈x1,1, . . . , x1,d1〉 , . . . , 〈xδ,1, . . . , xδ,dδ〉) = 〈α1, . . . , αd〉 where each αi is a sequence of variables xj,k and constant strings. Example 2. The concrete syntax of the example English grammar might look like follows: s◦ p (x, 〈y1, y2〉) = x y1 y2 s◦ t (x, 〈y1, y2〉) = ‘it is’ y2 ‘that ’ x y1 vp◦(x, y) = 〈x, y〉 np◦(x, y) = x y most◦ = ‘most ’ cat◦ = ‘cats’ fish◦ = ‘fish’ eat◦ = ‘eat ’ catch◦ = ‘catch’ Note that verb phrases have to consist of two discontiuous phrases, for the topicalization to function. 1.3 Subclasses of PMCFG A PMCFG where each variable xi,j occurs in its linearization is called nonerasing. If no variable xj,k occurs twice in a linearization the grammar is called a linear MCFG (LMCFG or just MCFG). A nonerasing and linear grammar (i.e. if each variable occurs exactly once in its linearization), is called a linear contextfree rewriting system (LCFRS). The following lemma states that LMCFG and LCFRS are equivalent formalisms [1]: 1 If the example seems strange, there are other languages (such as German or Swedish) where discontinuous verb phrases are more natural. Lemma 1. Any PMCFG grammar can be converted into an equivalent nonerasing grammar. Furthermore, linearity is preserved by the conversion. 1.4 Literal Movement Grammar and Range Concatenation Grammar Literal movement grammar (LMG; [2,3]), and its relative range concatenation grammar (RCG; [4,5]), are grammar formalisms based on predicates over string tuples. A grammar is a collection of clauses for predicates, very similar to Horn clauses and the programming language Prolog. We here define the general formalism of LMG, and then two equivalent subclasses, RCG and simple LMG (sLMG). We assume given a finite set Σ of terminal tokens, and an infinite supply of logical variables x1, x2, . . . ∈ Var. Definition 4 (predicate). A predicate is a term A(α1, . . . , αn), where each αi ∈ (Σ ∪ Var)∗ is a concatenative sequence of terminals and logical variables. Definition 5 (clause). A clause is of the form φ ` ψ1, . . . , ψm where each of φ, ψ1, . . . , ψm are predicates. A clause can be instantiated by substituting a string for each variable in the clause. A literal movement grammar consists a finite number of clauses together with a designated start predicate. To define the language of a lmg grammar G, we define a rewriting relation ⇒G on sequences of instantiated predicates, Γ1, φ, Γ2 ⇒G Γ1, ψ1, . . . , ψm, Γ2 whenever φ ` ψ1, . . . , ψm is an instantiation of a clause in G. The language of a grammar is then, L(G) = { w ∈ Σ∗ | S(w)⇒G } where S is the start predicate in G. Example 3. The example grammar looks like follows in LMG format: S(x y1 y2) ` NP(x), VP(y1, y2) S(‘it is’ y2 ‘that ’ x y1) ` NP(x), VP(y1, y2) VP(x, y) ` V(x), NP(y) NP(x y) ` D(x), N(y) D(‘most ’) ` N(‘cats’) ` NP(‘fish’) ` V(‘eat ’) ` V(‘catch’) ` A possible instantiation of the second clause is: S(‘it is fish that most cats eat ’) ` NP(‘most cats’), VP(‘eat ’, ‘fish’) LMG is a very general, Turing-complete, grammar formalism. To get a recognizable subclass of LMG, one can consider two possibilities; to restrict the definition of clause instantiation, or to put syntactic restrictions on the form of the predicates. Definition 6 (RCG). A range concatenation grammar (RCG) is an LMG with a restricted form of clause instantiation. A clause can only be instantiated by substrings of the given input string; i.e. if φ ` ψ1, . . . , ψm is an instantiation of a clause, then all arguments to φ, ψ1, . . . , ψm are substrings of the input. By only allowing instantiations by substrings of the input we assure that all strings in a RCG can be replaced by pairs of input positions, called ranges. This has the effect that RCG parsing is polynomial in the length of the input string. Example 4. If the input string is ‘b a c h’, then for the following clauses, A(bac) ` B(b), C(c) A(bach) ` B(b), C(ch) A(back) ` B(b), C(ck) the first two are RCG instantiations of the clause A(x a z) ` B(x), C(z); but not the third. Definition 7 (sLMG). A simple LMG (sLMG) is an LMG where each clause obeys the following three syntactic restrictions: – Non-combinatorial (NC): The arguments of each ψi are variables. – Bottom-up nonerasing (BNE): All variables in each ψi also occur in φ. – Bottom-up linear (BL): No variable occurs more than once in φ. Strictly speaking, bottom-up linearity is not a necessary condition, as the following lemma states: Lemma 2. Any LMG clause can be converted to an equivalent bottom-up linear (BL) clause. Furthermore, the conversion preserves NC and BNE. Proof (taken from [2,3]). Assume that the clause in question is φ ` ψ1, . . . , ψm, and that there is a variable x occurring twice in φ. Replace one occurrence by a new variable x′, and add a call to the bottom-up linear predicate Eq(x, x′), with the following definition: Eq( , ) ` Eq(s x, s y) ` Eq(x, y) (for each s ∈ Σ) The new clause φ ` ψ1, . . . , ψm, Eq(x, x′) is equivalent to the original, since the predicate call Eq(x, x′) says that x and x′ are equal strings. The conversion preserves NC, since the predicate Eq(x, x′) is non-combinatorial. Furthermore, is preserves BNE, since the only variable that is introduced on the left-hand side (x′) is also introduced on the right-hand side. ut 2 Boullier [4,5] defines RCG directly on ranges, but our definition is equivalent. Both formalisms sLMG and RCG are equivalent, since they describe exactly the class of languages recognizable in polynomial time [2,3,4,5,13]. Note that sLMG/RCG are closed under intersection; if S1 and S2 are the start predicates of G1 and G2, then S(x) ` S1(x), S2(x) defines the intersection of the languages L(G1) and L(G2). 1.5 PMCFG is an Instance of sLMG/RCG Assume given the following PMCFG rule: A→ f [B1, . . . , Bδ] f(x1,1, . . . , x1,n1 ; . . . ; xδ,1, . . . , xδ,nδ) = α1, . . . , αn By lemma 1, we can assume that the linearization is nonerasing. Furthermore, it is straightforward to convert a nonerasing PMCFG grammar into an equivalent sLMG grammar, as shown in [2,3]. Each rule above is converted to the clause: A(α1, . . . , αn) ` B1(x1,1, . . . , x1,n1), . . . , Bδ(xδ,1, . . . , xδ,nδ) Note that this clause is NC (since each of the xi,j is a variable) and BNE (since f◦ is nonerasing), and therefore the clause is sLMG. 2 The Intersection Operation There is an extension of context-free grammar called conjunctive grammar [6], where the right-hand sides of rules are extended with a new intersection operator. A conjunctive context-free rule is written: A→ α1 & . . . &αn where αi ∈ (N ∪ Σ)∗. The informal interpretation is that A can be rewritten to w ∈ Σ∗ iff all αi can be rewritten to w. This operation can be directly transformed to PMCFG linearizations. Definition 8 (intersection). The intersection operation is a partial linearization operation with the definition; φ1 &φ2 is calculated to φ1 iff φ1 = φ2. This definition can be made formal by lifting the linearization types to sets of linearization values; where the unit set denotes the existence of a linearization and the empty set denotes an undefined linearization. String concatenation, tuple forming and tuple projection are straightforwardly lifted to this domain. The definition of the intersection operation then simply becomes set intersection. We call PMCFG extended with the intersection operation conjunctive PMCFG. The following laws for intersections of linearizations are simple consequences of the formal definition:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Parsing with Parallel Multiple Context-Free Grammars

Parallel Multiple Context-Free Grammar (PMCFG) is an extension of context-free grammar for which the recognition problem is still solvable in polynomial time. We describe a new parsing algorithm that has the advantage to be incremental and to support PMCFG directly rather than the weaker MCFG formalism. The algorithm is also top-down which allows it to be used for grammar based word prediction.

متن کامل

Stochastic Multiple Context-Free Grammar for RNA Pseudoknot Modeling

Several grammars have been proposed for modeling RNA pseudoknotted structure. In this paper, we focus on multiple context-free grammars (MCFGs), which are natural extension of context-free grammars and can represent pseudoknots, and extend a specific subclass of MCFGs to a probabilistic model called SMCFG. We present a polynomial time parsing algorithm for finding the most probable derivation t...

متن کامل

RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar

Several grammars have been proposed for modeling RNA pseudoknotted structure. In this paper, we focus on multiple contextfree grammars (MCFGs), which are natural extension of context-free grammars and can represent pseudoknots, and extend a specific subclass of MCFGs to a probabilistic model called SMCFG. We present a polynomial time parsing algorithm for finding the most probable derivation tr...

متن کامل

Polynomial Time Learning of Some Multiple Context-Free Languages with a Minimally Adequate Teacher

We present an algorithm for the inference of some Multiple Context-Free Grammars from Membership and Equivalence Queries, using the Minimally Adequate Teacher model of Angluin. This is an extension of the congruence based methods for learning some Context-Free Grammars proposed by Clark (ICGI 2010). We define the natural extension of the syntactic congruence to tuples of strings, and demonstrat...

متن کامل

An Elegant Grammatical Formalism for the Class of Polynomial-time Recognisable Languages

We deene a grammar formalism equivalent to non-combinatorial literal movement grammar (LMG) as outlined in Gro95b], in such a way that it is easily recognized as a straightforward extension of linear context-free rewriting systems (LCFRS, Wei88]) and parallel multiple context-free grammars (PMCFG, KNSK92]). We show that the languages this formalism describes are precisely the languages recognis...

متن کامل

Efficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields

This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005